llm-factor: migrate to `candle` #2755

karthik2804 · 2024-08-26T07:59:26Z

This PR replaces the dependency on rustformers/llm to huggingface/candle. This allows us to run newer models like Llama 3(.1). This now requires the models to be of the safetensors format.

This PR also removes the concept of well-known models. This ensures a consistent directory structure for all models. The rationale is that, with this change, the only group of models initially supported is the Llama family.

Closes #2735

radu-matei · 2024-08-26T15:19:16Z

Given this is a breaking change, I'd suggest adding the 3.0 label.

karthik2804 · 2024-08-26T18:32:24Z

@radu-matei I do not believe I can add labels in this repository.

crates/llm-local/Cargo.toml

crates/llm-local/src/bert.rs

crates/llm-local/src/llama.rs

crates/llm-local/src/lib.rs

karthik2804 · 2024-09-13T10:13:53Z

The test failure does not seem to be related?

rylev · 2024-09-16T09:41:37Z

crates/llm-local/src/utils.rs

+    let json: serde_json::Value =
+        serde_json::from_reader(&json_file).map_err(candle::Error::wrap)?;
+    let weight_map = match json.get("weight_map") {
+        None => candle::bail!("no weight map in {json_file:?}"),
+        Some(serde_json::Value::Object(map)) => map,
+        Some(_) => candle::bail!("weight map in {json_file:?} is not a map"),
+    };


You can replace this with:

#[derive(Deserialize)] struct SafeTensorsJson { weight_map: HashMap<String, String> } let json: SafeTensorsJson = serde_json::from_reader(&json_file).map_err(candle::Error::wrap)?

I reverted this change and the other because it was leading to some off error, where the returned vector was a duplicate of the same thing repeated several times which meant the same files were being loaded over and over which led to consuming large amounts of memory.

crates/llm-local/src/utils.rs

rylev · 2024-09-16T09:49:15Z

crates/llm-local/src/utils.rs

+    for value in weight_map.values() {
+        if let Some(file) = value.as_str() {
+            safetensors_files.insert(file.to_string());
+        }
+    }
+    let safetensors_files = safetensors_files
+        .iter()
+        .map(|v| model_dir.join(v))
+        .collect::<Vec<_>>();


Suggested change

for value in weight_map.values() {

if let Some(file) = value.as_str() {

safetensors_files.insert(file.to_string());

}

}

let safetensors_files = safetensors_files

.iter()

.map(|v| model_dir.join(v))

.collect::<Vec<_>>();

safetensors_files.extend(weight_map.values().map(|v| model_dir.join(v))

This assumes no need to call as_str because of the suggested change above.

crates/llm-local/src/llama.rs

crates/llm-local/src/lib.rs

rylev · 2024-09-16T10:07:45Z

crates/llm-local/src/lib.rs

+}
+
+#[async_trait]
+trait CachedInferencingModel: Send + Sync {


Can we document this trait? What about it makes it Cached? Are implementors required to cache results or does it just happen that the current implementors do?

I'm fine with keeping the name, but I personally find the name CachedInferencingModel confusing when implementors aren't required to cache anything. InferencingModel seems like a more appropriate name.

crates/llm-local/src/token_output_stream.rs

rylev · 2024-09-16T14:11:29Z

crates/llm-local/src/token_output_stream.rs

+        match self.tokenizer.decode(tokens, true) {
+            Ok(str) => Ok(str),
+            Err(err) => anyhow::bail!("cannot decode: {err}"),
+        }


Suggested change

match self.tokenizer.decode(tokens, true) {

Ok(str) => Ok(str),

Err(err) => anyhow::bail!("cannot decode: {err}"),

}

self.tokenizer.decode(tokens, true).context("failed to decode token stream")

It does look like I cannot do this because tokenizer.decode returns a Result<String, Box<dyn Error + Send + Sync>> which does not seem to be suitable to use context on(?)

rylev · 2024-09-16T14:13:29Z

crates/llm-local/src/token_output_stream.rs

+        };
+        self.tokens.push(token);
+        let text = self.decode(&self.tokens[self.prev_index..])?;
+        if text.len() > prev_text.len() && text.chars().last().unwrap().is_alphanumeric() {


I don't fully understand what this check is supposed to be doing. Why do we care about the length of the next text vs the previous, and why do we care whether the last character is alphanumeric?

The length check is to see if we have any new tokens. The alphanumeric check is supposed to be to check if we have a valid token to decode. That is what I gather from the python function the docs link to

https://github.com/huggingface/text-generation-inference/blob/5ba53d44a18983a4de32d122f4cb46f4a17d9ef6/server/text_generation_server/models/model.py#L68

The python code is dealing with unfinished utf-8 byte sequences which is not possible at this point in the Rust code. Rust chars are guaranteed to be valid utf-8. The check for alphanumeric chars is checking that the character is A-Z | a-z | 0-9 which does seem to be what we want.

The Tokenizer::decode function returns Strings so I'm guessing somehow the tokenizer crate is taking care of byte sequences that aren't valid utf-8?

Here is the relevant rust version from where this is borrowed.
https://github.com/huggingface/candle/blob/6eea45a761fc1636b5e8012d02bdaa93321652ca/candle-examples/src/token_output_stream.rs#L43

crates/llm-local/src/token_output_stream.rs

crates/llm-local/src/llama.rs

crates/llm-local/src/lib.rs

rylev · 2024-09-17T09:05:20Z

crates/llm-local/src/lib.rs

+}
+
+#[async_trait]
+trait CachedInferencingModel: Send + Sync {


I'm fine with keeping the name, but I personally find the name CachedInferencingModel confusing when implementors aren't required to cache anything. InferencingModel seems like a more appropriate name.

crates/llm-local/src/lib.rs

rylev · 2024-09-17T09:10:22Z

crates/llm-local/src/token_output_stream.rs

+        };
+        self.tokens.push(token);
+        let text = self.decode(&self.tokens[self.prev_index..])?;
+        if text.len() > prev_text.len() && text.chars().last().unwrap().is_alphanumeric() {


The python code is dealing with unfinished utf-8 byte sequences which is not possible at this point in the Rust code. Rust chars are guaranteed to be valid utf-8. The check for alphanumeric chars is checking that the character is A-Z | a-z | 0-9 which does seem to be what we want.

The Tokenizer::decode function returns Strings so I'm guessing somehow the tokenizer crate is taking care of byte sequences that aren't valid utf-8?

crates/llm-local/src/token_output_stream.rs

rylev

🎉

Signed-off-by: karthik2804 <[email protected]>

rylev added the spin-3.0 label Aug 27, 2024

rylev reviewed Aug 27, 2024

View reviewed changes

crates/llm-local/Cargo.toml Outdated Show resolved Hide resolved

crates/llm-local/src/bert.rs Show resolved Hide resolved

crates/llm-local/src/llama.rs Outdated Show resolved Hide resolved

rylev reviewed Sep 2, 2024

View reviewed changes

crates/llm-local/src/llama.rs Show resolved Hide resolved

karthik2804 force-pushed the llama3-llm-factor branch from 8a822de to 77b2aaf Compare September 9, 2024 10:56

karthik2804 changed the base branch from factors to main September 9, 2024 10:57

karthik2804 force-pushed the llama3-llm-factor branch from 77b2aaf to 927679c Compare September 9, 2024 10:59

karthik2804 requested a review from rylev September 9, 2024 10:59

karthik2804 commented Sep 9, 2024

View reviewed changes

crates/llm-local/src/lib.rs Outdated Show resolved Hide resolved

karthik2804 force-pushed the llama3-llm-factor branch 3 times, most recently from 611f2b2 to 5abb8ca Compare September 13, 2024 09:16

karthik2804 marked this pull request as ready for review September 13, 2024 09:26

rylev requested changes Sep 16, 2024

View reviewed changes

karthik2804 force-pushed the llama3-llm-factor branch 2 times, most recently from 08f1611 to 3352e7e Compare September 16, 2024 13:39

karthik2804 requested a review from rylev September 16, 2024 13:42

rylev reviewed Sep 16, 2024

View reviewed changes

karthik2804 force-pushed the llama3-llm-factor branch from 6afbcfb to 3db74cc Compare September 17, 2024 08:42

rylev reviewed Sep 17, 2024

View reviewed changes

karthik2804 force-pushed the llama3-llm-factor branch from d4c0a2e to 9c61749 Compare September 17, 2024 10:10

rylev approved these changes Sep 19, 2024

View reviewed changes

karthik2804 force-pushed the llama3-llm-factor branch 2 times, most recently from 1a20e61 to 279c58c Compare September 19, 2024 15:18

Replace rustformers/llm with candle

4e40481

Signed-off-by: karthik2804 <[email protected]>

karthik2804 force-pushed the llama3-llm-factor branch from 279c58c to 4e40481 Compare September 19, 2024 18:04

rylev merged commit cedb9b0 into fermyon:main Sep 23, 2024
17 checks passed

michelleN mentioned this pull request Sep 30, 2024

Track, Agree Upon, and Document Breaking Semantic Changes #2815

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-factor: migrate to `candle` #2755

llm-factor: migrate to `candle` #2755

karthik2804 commented Aug 26, 2024

radu-matei commented Aug 26, 2024

karthik2804 commented Aug 26, 2024

karthik2804 commented Sep 13, 2024

rylev Sep 16, 2024

karthik2804 Sep 16, 2024

rylev Sep 16, 2024

rylev Sep 16, 2024

rylev Sep 17, 2024

rylev Sep 16, 2024

karthik2804 Sep 17, 2024

rylev Sep 16, 2024

karthik2804 Sep 17, 2024

rylev Sep 17, 2024

karthik2804 Sep 17, 2024

rylev Sep 17, 2024

rylev Sep 17, 2024

rylev left a comment

llm-factor: migrate to candle #2755

llm-factor: migrate to candle #2755

Conversation

karthik2804 commented Aug 26, 2024

radu-matei commented Aug 26, 2024

karthik2804 commented Aug 26, 2024

karthik2804 commented Sep 13, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rylev left a comment

Choose a reason for hiding this comment

llm-factor: migrate to `candle` #2755

llm-factor: migrate to `candle` #2755